-
Couldn't load subscription status.
- Fork 118
[POC] catalog-managed writes #1377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zachschuermann
wants to merge
7
commits into
delta-io:main
Choose a base branch
from
zachschuermann:cm-write-poc-2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
55196d0 to
5f0ec2b
Compare
a70195c to
268c26d
Compare
zachschuermann
added a commit
that referenced
this pull request
Oct 22, 2025
## What changes are proposed in this pull request?
Adds a new required method: `copy_atomic(&self, src: &Url, dest: &Url)
-> DeltaResult<()>` to `StorageHandler`. This PR also adds support for
the default engine via the (dumb) way of GET/PUT. Note that I've elected
to pursue the simple/correct thing here and we can attempt to optimize
in the future (and can open a follow-up if others agree).
~This implementation proposes a slight departure from existing `Engine`
APIs: instead of returning a `DeltaResult<()>` we return `Result<(),
CopyError>` with CopyError defined as:~
<details>
<summary>old pieces on CopyError omitted</summary>
```rust
#[derive(thiserror::Error, Debug)]
pub enum CopyError {
#[error("Destination file already exists: {0}")]
DestinationAlreadyExists(String),
#[error(transparent)]
Other(#[from] Box<dyn std::error::Error + Send + Sync>),
}
```
It captures the only things we care about from the `copy` API
perspective: either the destination already exists and we can return a
nice error message to the user saying their commit has already been
published (considering publishing is the main use case of this API for
now) _or_ we just got back some other random error which we don't really
care what it is, but rather just something we can surface to the user
and fail the overall publish API.
I've used this PR as an opportunity to introduce an Engine API more
aligned with our pursuit of finer-grainer errors (especially for Engine
trait) but happy to split out if we think it's better to just retain
existing `DeltaResult` pattern.
</details>
### Motivation
This PR will be used for commit publishing - basically copying commits
from staged commits to published commits. See #1377 for some context on
future usage.
### This PR affects the following public APIs
New required method in `StorageHandler` trait: `copy_atomic`
## How was this change tested?
new UT for default engine impl
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Catalog-managed writes POC. Not intended to merge, just e2e derisking. Major pieces:
copyengine API (with default engine impl)publishAPIsLogSegmentaddlatest_published_commit-> how to expose?commitAPI